Skip to content

Improve CGU creation algorithms #111900

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

nnethercote
Copy link
Contributor

An attempt to improve things.

r? @ghost

It is small and has a single call site, and this change will facilitate
the next commit.
Three of the four methods in `DefaultPartitioning` are defined in
`default.rs`. But `merge_codegen_units` is defined in a separate module,
`merging`, even though it's less than 100 lines of code and roughly the
same size as the other three methods. (Also, the `merging` module
currently sits alongside `default`, when it should be a submodule of
`default`, adding to the confusion.)

In rust-lang#74275 this explanation was given:

> I pulled this out into a separate module since it seemed like we might
> want a few different merge algorithms to choose from.

But in the three years since there have been no additional merging
algorithms, and there is no mechanism for choosing between different
merging algorithms. (There is a mechanism,
`-Zcgu-partitioning-strategy`, for choosing between different
partitioning strategies, but the merging algorithm is just one piece of
a partitioning strategy.)

This commit merges `merging` into `default`, making the code easier to
navigate and read.
I find that these structs obfuscate the code. Removing them and just
passing the individual fields around makes the `Partition` method
signatures a little longer, but makes the data flow much clearer. E.g.

- `codegen_units` is mutable all the way through.
- `codegen_units`'s length is changed by `merge_codegen_units`, but only
  the individual elements are changed by `place_inlined_mono_items` and
  `internalize_symbols`.
- `roots`, `internalization_candidates`, and `mono_item_placements` are
  all immutable after creation, and all used by just one of the four
  methods.
This is something that took me some time to figure out.
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels May 24, 2023
@nnethercote nnethercote marked this pull request as draft May 24, 2023 07:08
@nnethercote
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 24, 2023
@bors
Copy link
Collaborator

bors commented May 24, 2023

⌛ Trying commit 729a48897e17b149656fe315c7941bd3c3df2a89 with merge 13166eef596f11ec3a87d7de3557a4d310cad482...

@bors
Copy link
Collaborator

bors commented May 24, 2023

☀️ Try build successful - checks-actions
Build commit: 13166eef596f11ec3a87d7de3557a4d310cad482 (13166eef596f11ec3a87d7de3557a4d310cad482)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (13166eef596f11ec3a87d7de3557a4d310cad482): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
97.4% [0.5%, 6569.2%] 236
Regressions ❌
(secondary)
6.1% [0.7%, 24.5%] 250
Improvements ✅
(primary)
-1.8% [-1.8%, -1.8%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 97.0% [-1.8%, 6569.2%] 237

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
6.2% [1.3%, 35.8%] 217
Regressions ❌
(secondary)
6.7% [1.5%, 16.9%] 252
Improvements ✅
(primary)
-15.9% [-34.4%, -2.7%] 8
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 5.4% [-34.4%, 35.8%] 225

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
88.8% [0.8%, 5590.4%] 234
Regressions ❌
(secondary)
7.8% [0.9%, 37.7%] 248
Improvements ✅
(primary)
-1.0% [-1.0%, -1.0%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 88.4% [-1.0%, 5590.4%] 235

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
13.7% [0.0%, 72.9%] 57
Regressions ❌
(secondary)
6.0% [0.4%, 30.3%] 6
Improvements ✅
(primary)
-0.1% [-0.1%, -0.1%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 13.5% [-0.1%, 72.9%] 58

Bootstrap: 647.507s -> 688.121s (6.27%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels May 24, 2023
@nnethercote
Copy link
Contributor Author

nnethercote commented May 25, 2023

Terrible results here. Some make sense, some don't.

  • The gigantic regressions are for incr-unchanged and incr-patched. As written, the CGU splitting I added is non-deterministic because it depends on hash table iteration order. This effectively breaks incremental compilation for the backend, because the CGUs formed are completely different even when the code is unchanged. Fixing this should be straightforward.
  • But there are also lots of large regressions (up to 35% walltime) for even check-full builds. This doesn't make sense, because check builds shouldn't be affected by CGU formation. And I can't replicate these locally.

@nnethercote
Copy link
Contributor Author

I fixed the non-determinism issue, so the perf results for incremental builds should now be less catastrophic.

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 25, 2023
@bors
Copy link
Collaborator

bors commented May 25, 2023

⌛ Trying commit 5005ebb with merge 2cf271b196764e749f738e7085ae01a46998d8ce...

@bors
Copy link
Collaborator

bors commented May 25, 2023

☀️ Try build successful - checks-actions
Build commit: 2cf271b196764e749f738e7085ae01a46998d8ce (2cf271b196764e749f738e7085ae01a46998d8ce)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (2cf271b196764e749f738e7085ae01a46998d8ce): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
24.7% [0.2%, 908.3%] 82
Regressions ❌
(secondary)
0.8% [0.2%, 14.3%] 58
Improvements ✅
(primary)
-1.0% [-3.1%, -0.3%] 5
Improvements ✅
(secondary)
-1.1% [-2.1%, -0.6%] 17
All ❌✅ (primary) 23.2% [-3.1%, 908.3%] 87

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
4.7% [0.7%, 20.0%] 29
Regressions ❌
(secondary)
1.8% [1.8%, 1.8%] 1
Improvements ✅
(primary)
-14.5% [-36.8%, -3.2%] 9
Improvements ✅
(secondary)
-2.8% [-3.1%, -2.4%] 2
All ❌✅ (primary) 0.2% [-36.8%, 20.0%] 38

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
68.5% [1.1%, 862.2%] 27
Regressions ❌
(secondary)
4.4% [1.5%, 15.6%] 7
Improvements ✅
(primary)
-2.0% [-3.2%, -1.2%] 5
Improvements ✅
(secondary)
-8.8% [-14.0%, -4.6%] 10
All ❌✅ (primary) 57.5% [-3.2%, 862.2%] 32

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
10.4% [0.0%, 47.0%] 56
Regressions ❌
(secondary)
3.7% [0.3%, 17.8%] 6
Improvements ✅
(primary)
-0.1% [-0.1%, -0.1%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 10.2% [-0.1%, 47.0%] 57

Bootstrap: 646.176s -> 659.34s (2.04%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 25, 2023
@nnethercote
Copy link
Contributor Author

New results are better, but still terrible.

@nnethercote
Copy link
Contributor Author

For posterity: the splitting did result in much more evenly sized CGUs. Here are some ratios between the smallest and largest CGU for various benchmarks. Clearly CGU size is much less important than other characteristics (inlining?) when it comes to compiler perf.

-----------------------------------------------------------------------------
opt
-----------------------------------------------------------------------------
        post-merge      final
        old  new        old  new
cargo:  4.7  1.0        4.0  1.9
crane:  2.5  1.0        2.4  1.7
dies:  29.5  2.6       46.4  8.6
ripg:   4.7  1.1        3.2  1.9
webr:   3.5  1.0        4.2  2.1

-----------------------------------------------------------------------------
debug
-----------------------------------------------------------------------------
        post-merge      final
        old    new      old    new
cargo:  480.9  1.1      480.0  1.1
crane:  760.0  9.4      760.0  9.5
diesel: 375.5  ditto    375.5  ditto
ripg:   12397  ditto    12413  ditto
webr:   1000   1.6       1000  2.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf-regression Performance regression. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants